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Abstract 

This paper presents a method for selecting Regions of Interest (ROI) in brain Magnetic Resonance Imaging (MRI) for 
diagnostic purposes, using statistical learning and vector quantization techniques. The proposed method models the 
distribution of GM and WM tissues grouping the voxels belonging to each tissue in ROIs associated to a specific neurological 
disorder. Tissue distribution of normal and abnormal images is modelled by a Self-Organizing map (SOM), generating a set 
of representative prototypes, and the receptive field (RF) of each SOM prototype defines a ROI. Moreover, the proposed 
method computes the relative importance of each ROI by means of its discriminative power. The devised method has been 
assessed using 818 images from the Alzheimer's disease Neuroimaging Initiative (ADNI) which were previously segmented 
through Statistical Parametric Mapping (SPM). The proposed algorithm was used over these images to parcel ROIs 
associated to the Alzheimer's Disease (AD). Additionally, this method can be used to extract a reduced set of discriminative 
features for classification, since it compresses discriminative information contained in the brain. Voxels marked by ROIs 
which were computed using the proposed method, yield classification results up to 90% of accuracy for controls (CN) and 
Alzheimer's disease (AD) patients, and 84% of accuracy for Mild Cognitive Impairment (MCI) and AD patients. 
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Introduction 

Nowadays, neurodegenerative disorders affect over 30 million 
people around the world. Due to the increasing life expectancy 
and ageing of the population in developed nations, they are 
expected to affect 60 million people worldwide over the next 50 
years. A distinctive example of these neurodegenerative disorders 
is the Alzheimer's Disease (AD). Indeed, the latest estimates argue 
for global prevalence of AD to be quadrupled by 2050. slow, 
degenerative disease related to pathological amyloid depositions 
and hyperphosphorylation of structural proteins in the brain [1]. 
Its progression depends on every individual, and it usually begins 
showing signs of mild memory problems which turn into severe 



brain damage some years later. As for other neurological 
disorders, there is currently no cure for the AD; however early 
diagnosis may help to slow down the rapid advance of the disease. 
Although the development of the disease depends on the 
individual, aging, etc. there are many common symptoms in 
addition to structural changes in the brain. Some specific image- 
based diagnostic methods for AD and other neurological disorders 
that use functional imaging have been developed previously. Since 
AD causes loss of brain function, affecting the areas of the brain 
related to memory, thought and language, it is possible to deal 
with automatic diagnosis tools by learning patterns associated to 
brain functions. In fact, there exist some methods that aim to 
detect functional brain variations [2-6] by exploiting the 
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information contained in the images to learn patterns associated to 
cerebral damage. 

AD can usually be diagnosed by means of neuropsychological 
tests [7] , but at an early stage of the disease, when symptomatic 
cognitive problems are perceptible, a noticeable neurodegenera- 
tion has already occurred [8]. However [9,10], suggest that 
pathological manifestations of AD can be detected some years 
earlier, before cognitive problems arise. This would allow to start 
specific therapy for the patient at early stages of the disease. In 
fact, works such as [1,8,11-14] are focused in the use of both 
different statistical and artificial intelligence methods to reveal 
patterns in MCI patients relating to either structural artrophy in 
MRI or functional patterns that allow to differentiate them from 
AD or control patients (CN). The most functional imaging-based 
techniques use Single Emission Computed Tomography (SPECT) 
[2,3,15] or Positron Emission Tomography (PET) images [4,16] to 
detect decreased blood perfusion or decreased glucose metabolism 
in brain areas associated to AD. These methods aim to build 
models from a set of labelled images to be used for further 
classification of new patients. In this way, Gaussian mixture 
models (GMM) are used in [4,17] to model functional images in 
order to select regions of interest (ROIs) for AD. Unfortunately, 
GMM-based methods are not straightforward to apply for ROI 
computation in MRI due to the high number of voxels and the 
high number of Gaussian components to be estimated. Conse- 
quently, it may result unfeasible in practice. See [18] for a 
comprehensive explanation of the maximum likelihood method 
and a performance comparison with other existing methods. That 
work concludes that the maximum likelihood estimates are the 
most accurate, closely followed by the regression-type estimates, 
quantile method, and finally, the method of moments. The only 
disadvantage of the maximum likelihood estimation is that this 
technique is most computationally expensive. 

Other works such as [19,20] use principal component analysis 
to extract relevant features for AD diagnosis linked to discrimi- 
native regions, though. Similarly [21], defines association rules 
over discriminative regions. Nevertheless, such works use func- 
tional images to reveal discriminative regions and to classify brain 
images. 

Neurodegeneration relating to AD produces abnormal struc- 
tural changes in the brain, which eventually result on extreme 
shrinkage of the hippocampal volume or extreme reduction of its 
cortical thickness, as well as a severe enlargement of the internal 
ventricles. However, some structural changes associated to early 
AD are revealed to be similar to those caused by natural ageing 
process. Therefore, despite AD diagnosis can be addressed by 
detecting these structural changes [22—27], it is not straightforward 
to distinguish them in early stages of the disease, when only minor 
symptomatic cognitive dysfunctions are evidenced (i.e. progressive 
MCI patients) [28]. A review of different methods for automatic 
classification of AD using MRI images from the ADNI database 
can be found in [13]. 

Unlike methods that use intensity voxels for classification [29], 
most methods based on MRI begin with an image segmentation 
into different tissues and, later, a comparison of tissue distribution. 
Thus [30], uses an ensemble of sparse classifiers directly over the 
segmented tissues [30], without parcelling ROIs. Other works, 
however, take into account the tissue distribution in relevant brain 
regions related to AD. These regions, namely Regions of Interest 
(ROI), can be obtained by grouping voxels into anatomical regions 
using a labelled atlas [13,31,32]. ROI-based methods such as [33] 
parcel the brain into ROIs or segment the hippocampus [34,35] 
by wrapping an anatomy atlas, in order to compare tissue volumes 
in CN/AD or MCI/AD patients. These works report classification 



accuracies up to 90% on diagnosed AD patients through post- 
mortem analysis. Thus, as ROIs reveal brain areas linked to 
neurodegeneration, selection of voxels belonging to ROIs plays an 
important role in the classification task by allowing the classifier to 
be fed with the most discriminative information. Therefore, 
extracting relevant ROIs from training images that can be treated 
as markers, can finally determine the classification results. 

Current trend on studies focused on AD/CN and MCI/AD 
classification consist on using a single imaging modality. As shown 
above, some methods use functional imaging whereas other use 
structural MRI, and the analysis of each image modality can 
reveal different markers associated to the disease. For instance, 
MRI analysis reports grey matter atrophy that usually results in 
differences in both, the hippocampus and the entorhinal cortex. 
Additionally, it has been proved that early changes in the 
hippocampal volume and entorhinal cortex are related to 
evidences of early AD, while this is not so obvious in other 
cerebral structures [1,36]. In fact, our contribution shows that 
incorporating patterns revealed from differences in white matter 
helps to achieve better classification results. 

Previous works such as [37] use MRIs from each class instead of 
segmented images in order to compute ROIs. On the other hand, 
in [37], class prototypes are computed by using the LVQ, 
algorithm [38] and new images are projected onto these 
prototypes to extract features in a similar way to the dgenface 
approach [39]. Besides, in [24], a method to extract ROIs from 
MRIs avoiding segmentation is shown. In this case it is not 
necessary to perform a MRI segmentation; conversely, image 
preprocessing such as intensity normalization must be carried out. 
However, the level of accuracy provided by this method is 
significantly lower than the one provided by our approach, 
especially for MCI/ AD classification, since it deals with early AD 
diagnosis. 

In our work, structural MRI is used to automatically reveal 3D 
brain regions associated to AD by means of analyzing the 
distribution of WM and GM tissues in the brain. The computed 
ROIs are later used as a mask in order to extract voxels from 
relevant regions for subsequent analyses. Thus, regions linked to a 
specific brain disorder are automatically computed by quantizing 
the space into a number of regions that group similar voxels. This 
way, it is not necessary to get additional information from any 
anatomy atlas. Additionally, a measurement of relative importance 
is calculated for each ROI on the basis of its discriminative 
capability for AD diagnosis. This is accomplished by computing a 
number of prototype vectors using the SOM algorithm that 
quantize the space taking into account differences between CN 
and AD images. Consequently, the receptive field of each SOM 
unit represents a ROI on the image and its relative importance can 
be computed by means of the discriminative power of each voxel. 
Figure 1 shows the block diagram of the proposed method to 
extract and to selecting ROIs. The devised method has been 
evaluated in terms of its ability to classify new patients (not used in 
the training stage) correctly, through extensive experiments 
performed over the 1075-Tl ADNI database. 

The main contributions of this work can be summarized in the 
following: 

1 . A method for automatic ROI delineation is proposed, using 
SOM to quantize the feature space composed by voxel 
coordinates and information regarding voxel differences 
between classes. 

2. It automatically reveals brain areas (ROIs) according to 
patterns learnt to differentiate CN and AD patients (or 
MCI/AD patients). 
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Figure 1 . Block diagram of the proposed method for extracting 
and selecting ROIs. 

doi:10.1371/journal.pone.0093851.g001 

3. A relative importance measurement is also computed for each 
ROI, to indicate its discriminative capability. 

4. The method has been assessed through extensive experiments 
using 818 images from the 1075-Tl ADNI database, yielding 
90% (AUG 0.92) and 84% (AUG 0.84) of classification 
accuracy within GN/AD and MGI/AD, respectively. 

The rest of this paper is organized as follows. Section materials 
and methods shows the materials and methods used in this work, 
including the necessary background to support the next sections 
and details on the database used to assess the proposed method. 
Examples of ROI computation are also provided in this section. 
Experimental results section provides classification results using 
ROIs computed with the proposed method over the database 
previously described. Finally, conclusions are drawn in the 
conclusions section. 



the progression of mild cognitive impairment (MGI) and early 
Alzheimers disease (AD). Determination of sensitive and specific 
markers of very early AD progression is intended to aid 
researchers and clinicians to develop new treatments and monitor 
their effectiveness, as well as lessen the time and cost of clinical 
trials. The Principal Investigator of this initiative is Michael W. 
Weiner, MD, VA Medical G enter and University of Galifornia - 
San Francisco. ADNI is the result of efforts of many co- 
investigators from a broad range of academic institutions and 
private corporations, and subjects have been recruited from over 
50 sites across the U.S. and Ganada. The initial goal of ADNI was 
to recruit 800 subjects but ADNI has been followed by ADNI-GO 
and ADNI-2. To date these three protocols have recruited over 
1 500 adults, ages 55 to 90, to participate in the research, consisting 
of cognitively normal older individuals, people with early or late 
MGI, and people with early AD. The follow up duration of each 
group is specified in the protocols for ADNTl, ADNT2 and 
ADNI-GO. Subjects originally recruited for ADNI-1 and ADNI- 
GO had the option to be followed in ADNI-2. For up-to-date 
information, see www.adni-info.org. 

ADNI database [40] was created to study Alzheimer disease's 
progression, collecting a vast amount of MRI and Positron 
Emission Tomography (PET) images as well as blood biomarkers 
and cerebrospinal fluid analyses. The main goal of this database 
was to provide a way to diagnose early AD stages. ADNI database 
provides data for three groups of subjects: healthy individuals 
(Gontrols, GN), Alzheimer disease patients (AD) and patients 
suffering from mild cognitive impairment symptoms (MGI). The 
database that has been used in this work, contains 1075 Tl- 
weighted MRI images, comprising 229 GN, 401 MGI (312 stable 
MGI and 86 progressive MGI) and 188 AD images. Specifically, 
we have used the database ADNI 1 '.Screening 1.5T (subjects who have a 
screening data). This database contains MRI data from 818 subjects 
and repeated scans in some cases. When multiple scans of the 
same subject were available, the first one was selected. As a result, 
818 images have been used for assessing our approach. 
Demographic data of patients in the database is summarized in 
Table 1. 



Materials and Methods 

Database 

Data used in the preparation of this article were obtained from 
the Alzheimers Disease Neuroimaging Initiative (ADNI) database 
(adni.loni.usc.edu). The ADNI was launched in 2003 by the 
National Institute on Aging (NIA), the National Institute of 
Biomedical Imaging and Bioengineering (NIBIB), the Food and 
Drug Administration (FDA), private pharmaceutical companies 
and non-profit organizations, as a $60 million, 5-year public- 
private partnership. The primary goal of ADNI has been to test 
whether serial magnetic resonance imaging (MRI), positron 
emission tomography (PET), other biological markers, and clinical 
and neuropsychological assessment can be combined to measure 



Image preprocessing and co-registration 

Image data pre-processing, segmentation and co-registration of 
Tl -weighted MRI images from the ADNI database have been 
performed. Initially, images from the ADNI database were nor 
skull-stripped neither spatially normalized. Thus, all the images 
had to be pre-processed and co-registered before segmentation. 
The whole process has been performed using the VBM [41] 
toolbox for SPM. Pre-processing, co-registration and segmentation 
procedures as well as the parameters used at each stage can be 
summarized as follows: 

1 . Pre-processing and co-registration. 



Table 1. Dennographic data of patients in the database (ADNI 1075-Tl). 





Diagnosis 


Number 


Age 


Gender (M/F) 


MMSE 


CN 


229 


75.97±5.0 


119/110 


29.00 ±1.0 


MCI 


401 


74.85 ±7.4 


258/143 


27.01 ±1.8 


AD 


188 


75.36±7.5 


99/89 


23.28±2.0 


doi:1 0.1 371/joumal.pone.0093851 .tOOl 
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• The whole process was guided by means of tissue probability 
maps. A nonlinear deformation field that best overlays the 
tissue probability maps on the individual sujects' image is 
estimated. The tissue probability maps provided by the 
International Consortium for Brain Mapping (ICBM) are 
derived from 452 Tl -weighted scans, which were aligned 
with an atlas space, then corrected for scan inhomogeneities, 
and finally classified into grey matter, white matter and 
cerebrospinal fluid. The data were affine registered to the 
MNI space and down-sampled to 2 mm resolution. More- 
over, all images from the database were resized to 
121x145x121 voxels. 

• A mutual information aflfme registration with the tissue 
probability maps was used to achieve approximate align- 
ment. 

• Spatial normalization was based on a high-dimensional 
Dartel normalization and used standard Dartel template 
provided by VBM 8. 

2. Segmentation. 

• The number of Gaussians used to represent the intensity 
distribution for each tissue class was set to 2 for all grey 
matter, white matter and cerebrospinal fluid. The use of 
multiple components per tissue allows to reckon partial 
volume effects and deep GM differing from cortical GM. 

• A very light bias regularization was performed to correct 
smooth, spatially varying artifacts that modulates the 
intensity of the images. 

• Gaussian bias-smoothing was not used, though, since 
according to our experiments, non-smoothed images pro- 
vided a better classification results. 

• Warping regularization was set to 4 to determine the tradeoff 
between the two terms of the objective function for 
registering the tissue probability maps to the image to be 



processed. One term gives a function of how probable the 
data is given the parameters. The other is a function of how 
probable the parameters are, and provides a penalty for 
unlikely deformations. 

• A spatial adaptive non local means denoising filter is applied 
to the data in order to remove noise while preserving edges. 
The smoothing filter size is automatically estimated based on 
the local variance in the image. 

• A hidden Markov Random Field (MRF) with a weighting of 
0.15 was used to encode spatial constraints of neighboring 
voxels. Neighboring voxels were expected to have the same 
class labels. The prior probability of the class and the 
likelihood probability of the observation were combined to 
estimate the Maximum a posteriori (MAP). 

• Skull stripping was performed by using SPM-VBM tool and 
VBM templates. 

As a result, probability maps were obtained for each MRI in the 
database, which consisted of values in the range (0,1) for each 
voxel and related to its membership probability (WM, GM or CSF 
tissues). However, CSF distribution was not used in our 
experiments. 

Background in SOM 

The Self-Organizing Map (SOM) [38] is a well-known, peculiar 
clustering algorithm, inspired in the animal brain which seeks for 
the most representative and most economic representation of data 
and its corresponding relationships [38,42]. During the training 
stage, the prototypes keep the most representative part of the input 
data, while the units on the output space (i.e. 2D or 3D lattice) 
holding similar prototypes (in terms of Euclidean distance) are 
moved closed together into a group. Thus, some important 
features of the input space can be inferred from the output space 
[42], regarding the input space modelling, density distribution of 
the data space and feature selection. SOM training is performed in 
a competitive way so that just a single neuron wins (i.e. its 
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Figure 2. FDR values for some relevant slices on the (a) coronal and (b) axial planes for CN and AD subjects. 

doi:1 0.1 371 /journal.pone.0093851 .g002 



PLOS ONE I www.plosone.org 



4 



April 2014 | Volume 9 | Issue 4 | e93851 



Automatic ROI Selection in Structural Brain MRI 



prototype vector is the most similar one to the input data instance). 
The most similar prototype to the input data sample is called Best 
Matching Unit (BMU) and it is computed as: 



\\vk-m\\<hk-(^j\\ ^ s 



(1) 



where is the /^-sample from the input space, co/ is the i- 
prototype (i.e. the weight associated the the /-unit) and S is the 
output SOM space. 

Moreover, prototypes of neurons belonging to the neighbour- 
hood of the BMU are also updated according to 



CDj(t + 1) = (Oj(t) + (x(t)hij(t)(vk - (Oj(t)) 



(2) 



where a(^) is the learning factor and hij{t) is the neighbourhood 
function defining the unit surrounding the BMU cO/. 

y 

hij(t) is a Gaussian neighbourhood function hij(t) = e^''^ , that 
determines the units to be updated at current iteration, where dfj is 
the distance between map units / and j and Gt is the 
neighbourhood radius at time t. The learning rate follows the 

reciprocally-decreasing function a(^) = ^ ^ lQQ//y ' ^^^^^ 

the initial learning rate and T is the training length. 

SOM training is accomplished in two phases. The first one 
provides a rough organization of the map by setting the initial 
radius to round (max(l,Vnd)) (where nd is the number of SOM 
units) and the initial learning rate ao to 0. 1 . The second phase aims 
to fine-tune the map, and uses a smaller initial neighbourhood 
radius, specifically round(max(l,\/nd/2)). Moreover, the learning 
rate is also smaller in this phase, specifically ao=0.05. 

The training process generates the prototype vectors cOi which 
quantize the data manifold and represent cluster centres on the 
data, mapped to each BMU. As SOM units are located at different 
positions in a 3D lattice, the feature space is projected onto a 3D 
space. 

Statistical significance of voxels 

A MRI is composed of a high number of voxels according to a 
previously-stated image resolution. Images in the ADNI database 
used in this work contain 121x145x121 voxels after normaliza- 
tion. Since not all the voxels result equally discriminative for 
classification [21], they could be ranked according to a specific 
discriminative criterion. Thus, the Fisher Discriminant Ratio (FDR) 
[43] which is characterized by its separation capability as shown in 
[43,44] has been used to compute the discriminative power of each 
voxel. In the two-class separation case, FDR is defined as 



FDR-- 



(3) 



where jj.^ and are the mean and variance values of each input 
variable belonging to class s, respectively. In our case, JlI^ 
represents the mean image computed by averaging the intensity 
of the voxels in each definite position (x^,x^,x^) for all the images 
belonging to class s in the training set. Similarly, cr^ represents the 
variance image computed by taking into account the intensity of 
the voxels in a specific position (x^,x^,x^) for each image 
belonging to class s in the training set. Mean and variance images 
for class s can be computed as: 



fi,{x\x^,x^)= —— Il{x\x^,x^) 



(4) 



img f^^^ 



where Nffng is the number of images in the training subset and 
I^{x^,x^,x^) is the intensity at position (x^,x^,x^) for the image k. 
Similarly, the variance image is defined as 



o^.(x\x',x')= — J2 {Il(x\x\x')-n,{x\x\x')f (5) 



^^img 

From equations 4 and 5, a FDR image which contains the 
corresponding FDR value at each precise voxel position can be 
defined. 

FDR value increases as the variable results in more discrimi- 
native values between the two classes. FDR values are further 
incorporated to the ROI computation stage to calculate the 
relative importance of each ROI. At this point, it is important to 
highlight that FDR is considered a part of the training stage. Thus, 
it is computed by using only images from the training subsets 
during cross-validation, ensuring that test data is never used for 
training. 

Figure 2 shows the FDR values computed for some relevant 
slices on the coronal and axial planes, according to the scale shown 
in the colorbar. In these figures relevant areas related to the 
Alzheimer's disease such as the hippocampus appear with a high 
FDR value, indicating a higher discriminative power over other 
areas. 

Feature generation and ROI modelling 

Due to the high number of voxels present in a MR image, 
methods that aim to compress or synthesise the information 
contained in these voxels in a reduced number of features allow 
the diminishing of the classification task's computational cost. 
Regardless of this process, non-relevant features can be taken out. 
One of the classical tools for dimensionality reduction is the well- 
known Principal Component Analysis (PGA) [43]. PC A can be 
used to generate a reduced number of features from the MRI 
voxels in order to improve both, classification accuracy and 
computational effectiveness. In this paper we present a new 
approach that generates ROIs by quantizing the MRI space by 
using a SOM, and results using PGA and Voxel as Features (VAF) 
methods are provided as baselines for comparison. 

Principal Component Analysis. PC A generates an ortho- 
normal basis vector indicating the maximum variance directions. 
Thus, the projection onto this basis maximizes the scatter of all the 
projected samples. The PC A method can be briefly described as 
follows. Let X= [x\,X2,...,Xn] be the sample set of training vectors, 
where TV is the number of patients in the MRI database. Thus, 
each image is converted into a column vector that comprises the 
intensities of each voxel. After normalizing X to zero mean and 
unity variance, we obtain a new whithened dataset Y [43] and its 
covariance matrix can be computed as ^= YY^ . The 
eigenvectors 0 and eigenvalues X of Z can be computed by 
solving Z<I> = ^X. Thus, the eigenvectors or principal components 
(PCs) of the covariance matrix, define the directions of maximum 
variance of the data manifold. Usually, the eigenvectors are chosen 
in increasing-variance order, in such a way that the first 
eigenvectors compress the most part of the variance explained 
[43]. Subsequently, the projections of the data samples onto the 
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Figure 3. SOM units after training. The position of each unit represent the cluster centre in the brain, indicating areas with similar intensities. 
Intensity difference associated to each cluster is indicated using colours according to the colorbar. (a) and (b) show the model for GM and WM 
respectively. 

doi:1 0.1 371 /journal. pone.0093851.g003 

low dimensional space spanned by the principal components are 
used in the classification task. 

ROI modelling using SOM. The method presented in this 
paper aims not only to reduce the dimensionality of the input 
space but also to compute ROIs and then weight them according 
to their relative importance. The core idea of the method consists 
on quantize the MRI space by a number of prototypes which 
model the intensity distribution in different brain areas. Thus, 
information about intensity is incorporated to the quantization 
process in order to group the voxels according to their intensity 
level and being later capable to model the MRI intensity 
distribution in the SOM space. Therefore, the feature space is 
composed by vectors which contain the voxel's three-coordinates 
location in the three first coordinates and the intensity difference 
between CN and AD images. Such a difference image is computed 
as 

lD = I'cN-nD (6) 

where and /^^ are the mean image of normal and AD images, 
respectively. 

The feature space to be quantized is then formed by vectors 
Xi = {x\,x],x]jdX where {x\,x],x]) are the coordinates of the 
voxel / (i.e. x^,x^,x^Jd^ are components of Xi) and 7^/. is the 
intensity of voxel / in the difference image. Prototypes computed 
by means of vector quantization are cluster centres and represent 
groups of similar instances according to the Euclidean distance in 



the feature space (voxels), namely Receptive Field (RF). From the 
BMU concept described in section, the RF of a unit / {RFi) can be 
defined as 

RFi = {vk 6 M : \\vk-wj\\ < H-oMl'^j 6 G, i^j) (7) 

denoting the set of input data vectors for which co/ is the BMU. 
Subsequently, the relative importance of each ROI is computed 
according to the following equation 

Rrf„= I(Xi)*FDRi (8) 

Xj G RFfi 

where /(x/) and FDRf denote the intensity and the FDR value 
respectively, of the voxel at coordinates (x] ,x^,x^). Figure 3 shows 
the SOM units after training when using a 3D cylindrical (infinite 
plane) lattice [38]. The first three coordinates of the SOM 
prototypes determine the position of each unit which represents a 
cluster centre. The fourth coordinate is depicted using different 
colours indicating the importance of the ROI according to the 
colorbar. 

The core idea behind the presented method consists on 
quantizing the space to a number of model vectors, and to 
categorize them according to their relative importance and the 
level of intensity in the difference image. Consequently, the 
number of extracted ROIs corresponds to the number of units in 




Figure 4. 3D reconstruction from ROIs computed using the SOM model for (a) GM and (b) WM tissues. 

doi:1 0.1 371 /journal.pone.0093851 .g004 
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Figure 5. Projection of images in the SOM GM model for (a) CN and (b) AD example patients from the database. Units indicate the 
position of ROI centres and R value is encoded according to the corresponding colour bar. 
doi:1 0.1 371 /journal. pone.0093851.g005 



the SOM, and the size of each ROI depends on the size of the 
receptive field of the corresponding SOM unit. 

The self-organizing process carried out by SOM provides the 
models shown in Figure 3, where each unit represents a cluster 
centre in the brain, indicating areas with similar intensities in the 
difference image. Thus, models for GM and WM are built as 
shown in Figures 3(a) and 3(b) respectively. As previously 
indicated, feature vectors used as inputs for the SOM are 4- 
dimensional. The fourth dimension in the feature space is the 
intensity difference for each voxel, indicated using a colour code in 
the prototypes according to the colour bar at the right side in the 



figures. Images from GN or AD patients can be projected onto the 
SOM space showing the relative importance of each unit (that 
indicates ROI centre) by means of R value computed according to 
equation 8. 

A 3D volume can be reconstructed from the SOM model by 
means of the vector quantization process as depicted in Figure 4, 
where the RFs of SOM units define the size of each ROI. In this 
figure, 3D models for GM (a) and WM (b) are shown. 

Figures 5(a) and 5(b) show the position of the ROI centres for 
GN and AD patients respectively and units are coloured according 



^iT. ^ .ijr-v ^t^fi^ 
^ 

C-S^ ' *.r5 
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Figure 6. Difference image (right column) and ROIs computed by the proposed method (left) for (a) GM and (b) WM, respectively 
for CN/NOR images. Images from a random training subset from the cross-validation folds have been used. ROIs are coloured according to the 
colorbar scale due to their relative importance. 
doi:1 0.1 371 /journal. pone.0093851.g006 
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Figure 7. Difference image (right column) and ROIs computed by the proposed method (left) for (a) GM and (b) WM, respectively, 
for MCI/AD images. Images from a random training subset from the cross-validation folds have been used. ROIs are coloured according to the 
colorbar scale due to its relative importance. 
doi:1 0.1 371 /journal. pone.0093851.g007 



to its R value. In these figures, different patterns associated to CN 
and AD are revealed and they are clearly shown. 

Figures 6(a) and 6(b) show the difference between CN and AD 
images in the left column and the ROIs computed in the right 
column. Each ROI is marked with a different colour that indicates 
the relevance of that ROI according to its discriminative power. 
ROIs depicted in Figure 6 have been computed using FDR values 
above 90% of its maximum value. 

Similarly, Figures 7(a) and 7(b) show difference images and 
ROIs coloured according to its discriminative power for MCI/AD 
images. It is worth mentioning that most relevant ROIs computed 
by the proposed method, specially in GM, are compatible with 
areas that appear in literature as representative regions of AD, 
located in the temporal lobe, such as hippocampus and the 
superior temporal gyrus which are responsible for the individual's 
memory formation, speech perception, and language skills [45,46]. 
In the case of MCI patients, the method truly reveals structural 
changes mainly focused on the hippocampus area. 

Classification using Support Vector Machine (SVM) 

Classification of the feature vectors consisting of the relative 
importance measure computed as indicated in the ROI modelling 



section is accomplished by means of Support Vector Machine 
(SVM). SVMs were introduced in 70's by Vapnik [47] and 
consisted of a set of supervised learning methods widely used for 
classification and regression [47,48], which were designed to split 
off a set of binary-labelled data by means of a hyperplane. 
Specifically, they compute the maximal margin hyperplane to 
achieve maximum separation between classes. SVMs operate by 
building a decision function in the form / : ^ { + 1 } using n- 
dimensional training vectors X/ and class labels //: 

(xi,/i),(x2,/2),...,(x.„4)erx{+l} (9) 

in such a way that / is able to correctly classify new samples (x,/). 
Linear discriminative functions define decision hyperplanes in a 
multidimensional feature space: 

g{x) = v^x + vo (10) 

where v is the weight vector and Dq is a bias (threshold). This way, 
D^x -\-Do>\ if class 7/ = + 1 and o^x -\-Do<\ if class = — 1, and 
the weight vector v is orthogonal to the decision hyperplane. 
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Figure 8. Classification results for CN/AD. 
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Figure 9. Classification results for MCI/AD. 
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Finding the optimal separating hyperplane is accomplished by an 
optimization task consisting of finding the unknown parameters 
Vi,i= \,..,n which define the decision hyperplane. Let Xi,i= \,...,s 
be the feature vectors of the training set X. These belong to one of 
the two classes, either Vi or V2. If the classes are linearly separable, 
the objective would be to design a hyperplane that correctly 
classified all the training vectors. As a result, that hyperplane is not 
unique and the optimization process focuses on maximizing the 
generalization performance of the classifier, which can be defined 
as the effectiveness of the classifier to operate with new data. 
Among some other criteria, the maximal margin hyperplane is 
usually selected since it provides the maximum margin of 
separation between the two classes. Since the distance from a 
point / to the hyperplane is given by z= |g(x)|/ to scale v and 
Do so that the value of g(x) is + 1 for the nearest point in Vi and 
— 1 for the nearest points in V2, reduces the optimization problem 
to maximizing the margin 2/||?;|| with the constrains: 



^X-\-Do> l,Vx G Vi 



^X-\-Do< 1,VX G 02 



(11) 



(12) 



Thus, designing the classifier leads to a non-linear (quadratic) 
optimization task subject to a set of linear inequality constrains. 
The solution v is found to be a linear combination of TVy <^ feature 
vectors and the optimum hyperplane is called the support vector 
machine. 

For non-linearly separable data, the optimization process needs 
to be modified to work in combination with kernel techniques, so 



that the hyperplane that defines the SVM corresponds to a non- 
linear decision boundary in the input space. The use of kernels 
enables to map the data into some other dot product space F 
(namely, feature space) through a non-linear transformation 
(j) : IR^^i^, and perform the described linear algorithm in F. 
Thus, the decision function is nonlinear in the input space and 
takes the form 



Ns 



/(x) = sgnV^ ciiliK(si, x) + 



^^0 



(13) 



where parameters a/ and si are the solution for the optimization 
process, solved by either Quadratic Programming (QP) or the well- 
known Sequential Minimal Optimization (SMO) [49] and the 
support vectors [47] (i.e. the training vectors that are closest to the 
linear classifier since lie on either of the two hyperplanes, i.e. 
D-^x + f 0 = + 1 . In our case, we used a Radial Basis Function 
(RBF) K(x, l) = exp{ — y\\(x — l\\^) as kernel function. In fact, the 
use of radial basis functions as kernels in SVMs have proved to 
supply better results than linear kernels for different applications 
[11,50]. 

Results and Discussion 

To demonstrate the relevance of the proposed ROI selection 
method for diagnosis purposes, we applied the devised ROI 
selection algorithm to the overall database described in the 
Database subsection, including CN, MCI and AD patients. Such a 
procedure aims to objectively evaluate the discriminative power of 
the computed ROIs. Thus, classification through SVM supervised 



Table 2. Classification results for different voxel selection methods. 




Measure 


Proposed method 


PCA-SVM 


VAF-SVM 


CN/AD 


Accuracy 


0.90±0.06 


0.82 ±0.04 


0.67 ±0.04 


Sensitivity 


0.87 ±0.07 


0.80±0.08 


0.65 ±0.08 


Specificity 


0.92 ±0.09 


0.84±0.06 


0.70±0.05 


MCI/AD 


Accuracy 


0.83 ±0.06 


0.70±0.06 


0.61 ±0.07 


Sensitivity 


0.82 ±0.07 


0.61 ±0.08 


0.60 ±0.08 


Specificity 


0.87 ±0.09 


0.72±0.10 


0.75 ±0.09 
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Figure 10. ROC curves for (a) CN/AD and (b) MCI/AD classification, respectively. 
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learning is carried out to group the images according to the value 
of R obtained in equation 8 for each ROI. All the classification 
experiments were performed considering different subsets for cross 
validation. Specifically, sets of CN/AD and MCI/AD patients 
were obtained from the data manifold using k-fold, which takes k 
images for testing and use the rest for training. As usual, we used 
10-fold (k= 10) in our experiments. Results for CN/AD and 
MCI/AD classification are shown in Figures 8 and 9, respectively. 
These results were compared to the Voxels As Features (VAF) 
technique [5,51] which uses all the intensity voxels to feed the 
classifier in both, training and testing stages. In other words, VAF 
does not concern any feature generation or selection technique. 
On the contrary, scaled versions of the probability maps obtained 
as indicated in the Image Preprocessing section from segmented 
images, were used as features. Since VAF uses all the information 
contained in the images without selecting discriminative voxels, it 
may decrease the classification performance. Our method, which 
uses voxel selection and feature generation associated to ROIs, 
outperforms the VAF approach as shown in Figure 8. Addition- 
ally, experiments using PC A as described in the Statistical significance 
of voxels section, were performed for comparison. Our proposed 
method also outperforms the approach which uses PCA for feature 
extraction as shown in Table 2 as well. The experiments described 
above state that computed ROIs effectively indicate brain areas 
concerning AD, as above described. In fact, hippocampus and 
primary auditory cortex areas are marked as relevant regions in 
the computed ROIs. We also found that ventricles were marked 
too, which would correspond to brain areas as indicated in the 
literature [45,46,52] affected by atrophy process linked to 
neurodegeneration in AD. 

Similarly to the experiments described above for CN/AD 
patients, experiments that aim to differentiate between MCI and 
AD patients were carried out. Thus, MCI/AD classification results 
are shown in Figure 9. In this case, as image differences between 
MCI and AD are not so evident as in the CN/AD case, specific 
feature generation and reduction methods may result especially 
relevant to improve classification outcomes. Moreover, brain areas 
revealed through differences between MCI and AD patients are 
not as clear as in the CN/ AD case, and ROI selection helps figure 



out patterns relating to MCI. In this way, computed ROIs show 
differences mainly focused in the hippocampus area, stating 
differences associated to the atrophy in this brain area. 

At the same time, ROIs extracted from MCI patients have been 
assessed by means of classification experiments. Figures 8 and 9 
shows the performance of our method, which clearly outperforms 
the VAF approach. Moreover, these results set the stability in 
terms of the number of SOM prototypes of the proposed approach 
to compute ROIs and their relevance for classification. 

Figure 8 also states the effectiveness of the proposed method for 
extracting relevant brain areas relating to to AD, and the 
assignment of a relative relevance value to each one of these 
regions. Thus, accuracy values up to 90% and sensitivity values up 
to 87% were obtained, outperforming the classification results 
obtained when features are computed by means of classical 
techniques such as PCA. On the other hand, classification results 
between MCI and AD are shown in Figure 9, providing 83% of 
accuracy and 76% of sensitivity levels. Moreover, Figure 10(a) 
shows the Receiving Operating Curve (ROC) for MCI/AD 
classification, yielding an Area Under ROC curve (AUC) of 0.92. 
Similarly, ROC curve for MCI/AD is depicted in Figure 10(b), 
where the computed AUC was 0.84, indicating high sensitivity for 
AD patients. 

It is worth noting that there are several factors that may affect 
the classification results. One of them is related to the gold-standard 
diagnosis in the ADNI database, as only living subjects were 
analysed. This fact is specially relevant due to the difficulty for 
diagnosing AD in vivo [8,51]. In fact [51], shows that AD 
classification accuracy level diminishes when autopsy data is not 
available. In other words, patients whose medical records are 
contained in the ADNI database do not usually show severe AD, 
but mild AD symptoms [8]. This way, as results provided in this 
work are close to the ones obtained in other recent works such as 
[13]. Therefore, it can be stated that differences in MCI patients 
determine the effectiveness of ROI computation. Consequently, 
the experiments brought about within this work show the 
importance of the feature selection methods in AD diagnosis 
through MRI imaging, which are even more relevant in the case of 
MCI classification. 
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Conclusions 

In this work, a method based on unsupervised vector 
quantization techniques for automatic ROI calculation is present- 
ed. Specifically, a three-dimensional Self-Organizing Map is used 
to model MRI images selecting ROIs related to a particular 
neurological disorder. Images from controls and AD patients are 
used to compute tissue differences by means of voxel contrastation. 
Subsequently, the MRI space is quantized, computing a number 
of prototype vectors from features extracted taking into account 
the spatial relationship among voxels, figuring out similar areas 
within the difference image. Additionally, the relative importance 
of each cluster is computed by means of the cluster sizes and the 
statistical significance of the discriminative power of the voxels that 
make up the ROI. The spatial relationship among clusters is also 
preserved in the SOM output space due to the SOM's topology 
preserving properties. The method has been assessed by selecting a 
set of images from the ADNI [40] database, using the ROIs 
automatically computed using the presented approach. Moreover, 
most relevant ROIs computed across this method, specially in 
GM, are compatible with areas that appear in literature as 
representative regions of AD such as hippocampus [45,46,52]. 
The classification results provide average accuracy, sensitivity and 
specificity values of up to 90%, 87% and 92%, respectively, for 10 
cross-validation folds, while the AUG is 0.92. On the other hand. 
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our method is capable to distinguish MCI from AD patients up to 
83% of accuracy and AUG of 0.84. Classification experiments 
state the effectiveness of the proposed method to select relevant 
brain areas that are related to Alzheimer's disease. Since structural 
differences between MCI and AD are not as clear as in the CN/ 
AD case, results obtained for MCI/AD classification indicate the 
applicability of the method for an early diagnosis of AD. 
Furthermore, it is also devised to be applicable to uncover some 
other hidden neurological disorders, through small structural brain 
alterations, as well as be useful to explore adjacent brain areas 
related to such disorders. 
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